Specific challenges affecting the design of next-generation communication SoC devices include timing closure of large digital blocks, signal integrity in both the analog and digital sections, power distribution and the database limitations imposed by 32-bit tools.
To illustrate the challenges, in this article we detail a transceiver add/drop multiplexer (TADM) SoC design used in high-speed linear and protected-ring optical networks.
The chip supports complex Sonet/SDH protocols and allows the extraction and/or injection of lower-speed channels into the high-speed synchronous network at speeds up to 2.5 Gbits/second. The chip contains more than 7 million gates of logic and makes extensive use of serdes I/O-high-speed serial data with embedded clock.
The design challenges experienced in developing the device are a direct result of the following changes in the design environment:
Problems that are not identified until chip-level physical layout is complete lead to long verify-modify-redesign-reassemble-reverify loops that consume enormous amounts of design time.
In this article, we examine the challenges that occurred in the design of the TADM chip in light of the capabilities and limitations of the tools available to us at the time. We also outline new tools and capabilities that will be needed if designers are to continue to implement large SoC designs in a timely and cost-effective manner.
The TADM Sonet/SDH interface device provides a versatile solution for quad OC-3, quad OC-12 and single OC-48 linear and ring datacom/telecom applications. In essence, it is an add-drop multiplexer and framer for use in high-speed linear and protected ring optical networks. The device provides complete encapsulation and de-encapsulation for packet and ATM streams into and out of Sonet/SDH payloads. It is intended for use in voice gateways, VoIP, next-generation digital subscriber line access multiplexers (DSLAMs), Gigabit Ethernet-over-Sonet and other applications.
Built in 1.5-V, 0.16-micron CMOS technology, the device incorporates integrated Sonet/SDH-framing, section/line/path termination, pointer-processing, crossconnect and data-engine blocks. Communication with the TADM device is provided by a generic microprocessor interface with separate address and data buses. The external clock is 78 MHz (see figure, page 12).
As can been seen in the block diagram, the TADM connects to Sonet through an optical transponder. Mate is a proprietary interface that connects to a second device having a separate network interface for protection switching. The overhead processor terminates and processes Sonet header info. The pointer processor uses payload pointers and stuffing bytes to compensate for frequency phase/offsets in the synchronous hierarchy. A synchronous transport signal (STS) cross-connect switches 51-Mbit streams among the network, Mate, STM (external synchronous cross-connect) and the Utopia industry-standard ATM interface. The data processor maps ATM cell, HDLC and Ethernet packets into and out of the Sonet framing structure. The Mate connection is via multiple 622-MHz serdes.
Although not the largest of ASICs by raw gate count, this is a very complex device. It includes some 7 million gates of random logic (no significant regularity, e.g., data path) and 500 kbits of memory, and it features high-speed serial I/O. At the top level, it contains a few large heterogeneous blocks, which can be further broken down into smaller heterogeneous blocks and elements. Even the memory comprises many small register blocks, typically 256 x 32 bits, distributed throughout the chip. The result is that at each stage of the design, the device generates a very large amount of specification and design data-far beyond what its number of gates might indicate.
Timing closure
This was the first device that the group had designed in a 0.16-micron process. Designers noted a dramatic difference in the difficulty of achieving timing closure as they went from 0.25 micron down to 0.16 micron. Line resistance had become a significant factor, and the simple wire-load models used by the synthesis tool were no longer accurate. So the design that was initially synthesized fell far short of the required speed, especially on long global routes used to connect blocks at the chip level. In addition, slow paths could no longer be fixed by simply increasing the size of the driver. It was now necessary to insert buffers along the route to achieve the required performance.
In some designs, it is possible to solve timing problems by using a deep hierarchy in combination with strict pipelining at block boundaries. The hierarchy effectively constrains timing problems to a smaller part of the overall chip. But it also limits the effectiveness of place and route by constraining the placement of cells within their parent blocks. Also, in communications applications, latency is often a critical design constraint. In the Utopia interface, for example, handshake times are strictly prescribed and bounded. Adding extra clock delays would violated constraints. Another problem with adding pipeline delays late in the design cycle is that it invalidates all the test vectors.
The end result was that some modules required 12 to 14 iterations of synthesis, placement, routing, extraction and timing estimation to achieve closure. At the module level, that was frustrating; at the chip level it was a nightmare. Each iteration at the chip level took about one week in tool run-time alone.
The industry's move to smaller geometries and increased numbers of routing layers has also resulted in layouts that exhibit "antenna violations." That problem occur when a multilayer route is constructed in such a way that at some time during the fabrication sequence, a long wire fragment is connected to a gate input before it is connected to its driver-gate output. The long wire can then pick up a charge sufficient to break down and destroy the device before it even emerges from fabrication.
The solution is to add diodes along the route to help prevent the premature blowing of gates. These antenna violations are not discovered until final physical-layout verification. Their correction is yet another time-consuming back-end process that leads to physical redesign and thus to design delays.
It is good that we have tools that can accurately analyze the final product and catch errors, but relying on them to correct the errors caused by poor timing models in the front end causes significant project delays because of lengthy turnaround times. We need solutions that are correct by construction, not correct by iteration.
As wires become thinner, the resistance per unit length increases, leading to weaker drive currents and slower rise and fall times. As the spacing between the wires reduces, the coupling capacitance increases. The result is an increase in crosstalk, exhibited as slower timing (waiting until a transition is reliably completed); as incorrect values (even when not in transition), in more serious cases; and as extra transitions on clock lines to incorrect function
Signal integrity
The tools available to handle signal integrity problems such as crosstalk typically provide too little information, too late in the design process. Crosstalk analysis tools operating on the final physical TADM layout generated tens of thousands of warnings, of which only 10 to 20 were important. But determining which 10 to 20 was a task that required human judgment, was enormously time-consuming and, of course, was error-prone. Once again, we had a situation in which designers didn't discover problems until the end of the design process, at which time changes were very costly. We need tools that generate correct solutions based upon shielding, spacing and added drivers, far earlier in the design process.
For example, we need intelligent routers that can anticipate and avoid crosstalk problems. Particular care needs to be applied to buses. Solutions might include deliberately separating bus routes and skewing data transitions on adjacent lines. Once again, the objective is correct by construction.
In the next generation of this communications device, serdes speed will increase from 622 Mbits/s to 2.5 Gbits/s or maybe 5 Gbits/s. At those speeds, substrate noise from the digital logic will pose a threat to the sensitive phase-locked loops and VCOs in the modules. Although we have tools for carefully analyzing substrate noise in small circuits, we do not have good tools for estimating the impact of these effects in a large digital design. And we certainly have no tools to give us any early predictive guidance in preparing a floor plan that will mitigate such effects.
Besides signal integrity, there are challenges with the decreasing operating voltages of ICs. As voltages are reduced, supply current rises and voltage drop becomes more of an issue. For example, the TADM consumes 8 watts, which means it draws more than 5 amps of supply current. Careful design of the power grid is needed to avoid significant voltage drops, which can prevent a module from running at speed and can reduce noise margins, making a module more susceptible to cross-talk.
Memories can be especially sensitive to voltage drops, especially around sensitive sense amplifiers. Trying to understand potential voltage drops inside the chip is difficult until late in the design process, when detailed power/ground routing is available. Area-based flip-chip packaging with dedicated power- and ground-routing planes may be a solution.
All of these challenges are taking their toll on the amount of design data that needs to be processed.
When we moved to 32-bit machines in the 1980s, it was hard to imagine that we would ever exhaust a 32-bit address space. But that is exactly what is happening in complex designs today. Workstations and operating systems typically limit files to 2 Gbytes. The design team for the TADM described here often had to run a privileged (in an OS sense, for example, Unix super-user) window in order to get 1 more bit of address space. The solution will be tools that run on 64-bit machines.
File size also represents a major problem for today's globally distributed design teams. For example, it takes a minimum of three hours to transmit a 2-Gbyte file over a T1 line. Two of our design teams were located about 100 miles apart. We discovered that it was quicker to drive a tape from one location to another than to send a large design file electronically. That problem will, of course, only get worse as we integrate still larger systems on chip. Perhaps the solution will be design teams that operate remotely from a single set of centralized database/compute servers.
The challenges of designing next-generation communications SoCs are so important that they are addressed at a number of sessions at this year's Design Automation Conference (Table 2).
---
Bryan Ackland is vice president of communications system technology for Agere Systems (Allentown, Pa.) and chairman of the 39th Design Automation Conference. A Bell Labs fellow and IEEE fellow, he holds a PhD in EE from the University of Adelaide, Australia. Eugene Scuteri is product line director. He holds a BSEE from New Jersey's Fairleigh Dickinson University and an MSEE degree from New York Polytechnic University.
http://www.isdmag.com
Copyright © 2002 CMP Media LLC
6/1/02, Issue # 14156, page 10.